Back

npj Systems Biology and Applications

Springer Science and Business Media LLC

Preprints posted in the last 7 days, ranked by how well they match npj Systems Biology and Applications's content profile, based on 99 papers previously published here. The average preprint has a 0.10% match score for this journal, so anything above that is already an above-average fit.

1
SPLIT: Safety Prioritization for Long COVID Drug Repurposing via a Causal Integrated Targeting Framework

Pinero, S. L.; Li, X.; Lee, S. H.; Liu, L.; Li, J.; Le, T. D.

2026-04-16 health informatics 10.64898/2026.04.12.26350701 medRxiv
Top 3%
0.7%
Show abstract

Long COVID affects millions of people worldwide, yet no disease-modifying treatment has been approved, and existing interventions have shown only modest and inconsistent benefits. A key reason for this limited progress is that current computational drug repurposing pipelines do not match well with the clinical reality of Long COVID. These patients often have persistent, multisystemic symptoms and may already be taking multiple medications, making treatment safety a primary concern. However, most repurposing workflows still treat safety as a downstream filter and rely on disease-associated targets rather than causal drivers. They also assume that the findings of one analysis would generalize across the diverse presentations of Long COVID. We introduce SPLIT, a safety-first repurposing framework that addresses these limitations. SPLIT prioritizes safety at the start of the candidate evaluation, integrates complementary causal inference strategies to identify likely driver genes, and uses a counterfactual substitution design to compare drugs within specific cohort contexts. When applied to cognitive and respiratory Long COVID cohorts, SPLIT revealed three main findings. First, drugs with similar predicted efficacy could have very different predicted safety profiles. Second, the drugs flagged as unfavorable were often different between the two cohorts, showing that drug prioritization is phenotype-specific. Third, SPLIT flagged 18 drugs currently under active investigation in Long COVID trials as having unfavorable predicted profiles. SPLIT provides a practical framework to identify safer, more context-appropriate candidates earlier in the process, supporting more targeted and better-tolerated treatment strategies for Long COVID.

2
APOE4 Allele Frequencies Show Dramatic Variation Across Indian Populations

Ramdas, S.; Kahali, B.

2026-04-13 genetic and genomic medicine 10.64898/2026.04.09.26350483 medRxiv
Top 4%
0.5%
Show abstract

The APOE {varepsilon}4 allele is the strongest genetic risk factor for Alzheimers Disease. However, its distribution across Indian populations is poorly characterized. We analyze APOE allele frequencies in 9,524 individuals from 83 distinct populations in the GenomeIndia dataset. {varepsilon}4 frequencies show large variation across populations within India, ranging from 2.7% to 36.1%, with a median of 11%. Tribal populations have higher {varepsilon}4 frequencies compared to non-tribal groups, while Tibeto-Burman populations have significantly lower frequencies. One tribal population from the northern coastal highlands has {varepsilon}4 frequency of 0.36, with 59% of individuals being carriers. {varepsilon}4 carrier status correlates significantly with lipid phenotypes including LDL, HDL, total cholesterol, and triglycerides. Collectively, these findings reveal exceptional genetic diversity in Alzheimers Disease risk across India and have important implications for population-specific screening strategies, genetic counseling, and precision medicine approaches to dementia prevention.

3
REDDI: A Riemannian Ensemble Learning Framework for Interpretable Differential Diagnosis of Neurodegenerative Diseases

Roca, M.; Messuti, G.; Klepachevskyi, D.; Angiolelli, M.; Bonavita, S.; Trojsi, F.; Demuru, M.; Troisi Lopez, E.; Chevallier, S.; Yger, F.; Saudargiene, A.; Sorrentino, P.; Corsi, M.-C.

2026-04-12 neurology 10.64898/2026.04.10.26350617 medRxiv
Top 4%
0.5%
Show abstract

Neurodegenerative diseases such as Mild Cognitive Impairment (MCI), Multiple Sclerosis (MS), Parkinson s Disease (PD), and Amyotrophic Lateral Sclerosis (ALS) are becoming more prevalent. Each of these diseases, despite its specific pathophysiological mechanisms, leads to widespread reorganization of brain activity. However, the corresponding neurophysiological signatures of these changes have been elusive. As a consequence, to date, it is not possible to effectively distinguish these diseases from neurophysiological data alone. This work uses Magnetoencephalography (MEG) resting-state data, combined with interpretable machine learning techniques, to support differential diagnosis. We expand on previous work and design a Riemannian geometry-based classification pipeline. The pipeline is fed with typical connectivity metrics, such as covariance or correlation matrices. To maintain interpretability while reducing feature dimensionality, we introduce a classifier-independent feature selection procedure that uses effect sizes derived from the Kruskal-Wallis test. The ensemble classification pipeline, called REDDI, achieved a mean balanced accuracy of 0.81 (+/-0.04) across five folds, representing a 13% improvement over the state-of-the-art, while remaining clinically transparent. As such, our approach achieves reliable, interpretable, data-driven, operator-independent decision-support tools in Neurology.

4
Noisy periodicity in tropical respiratory disease dynamics

Yang, F.; Hanks, E. M.; Conway, J. M.; Bjornstad, O. N.; Thanh, N. T. L.; Boni, M. F.; Servadio, J. L.

2026-04-13 epidemiology 10.64898/2026.04.10.26350660 medRxiv
Top 4%
0.4%
Show abstract

Infectious disease surveillance systems in tropical countries show that respiratory disease incidence generally manifests as year-round activity with weak fluctuations and irregular seasonality. Previously, using a ten-year time series of influenza-like illness (ILI) collected from outpatient clinics in Ho Chi Minh City (HCMC), Vietnam, we found a combination of nonannual and annual signals driving these dynamics, but with unknown mechanisms. In this study, we use seven stochastic dynamical models incorporating humidity, temperature, and school term to investigate plausible mechanisms behind these annual and nonannual incidence trends. We use iterated filtering to fit the models and evaluate the models by comparing how well they replicate the combination of annual and nonannual signals. We find that a model including specific humidity, temperature, and school term best fits our observed data from HCMC and partially reproduces the irregular seasonality. The estimated effects from specific humidity and temperature on transmission are nonlinearly negative but weak. School dismissal is associated with decreased transmission, but also with low magnitude. Under these weak external drivers, we hypothesize that stochasticity makes a strong sub-annual cycle more likely to be observed in ILI disease dynamics. Our study shows a possible mechanism for respiratory disease dynamics in the tropics. When the external drivers are weak, the seasonality of respiratory disease dynamics is prone to the influence of stochasticity.

5
A geometric-surface PDE model for cell-nucleus translocation through confinement

Ballatore, F.; Madzvamuse, A.; Jebane, C.; Helfer, E.; Allena, R.

2026-04-17 biophysics 10.64898/2025.12.18.695144 medRxiv
Top 4%
0.4%
Show abstract

Understanding how cells migrate through confined environments is crucial for elucidating fundamental biological processes, including cancer invasion, immune surveillance, and tissue morphogenesis. The nucleus, as the largest and stiffest cellular organelle, often limits cellular deformability, making it a key factor in migration through narrow pores or highly constrained spaces. In this work, we introduce a geometric surface partial differential equation (GS-PDE) model in which the cell plasma membrane and nuclear envelope are described as evolving energetic closed surfaces governed by force-balance equations. We replicate the results of a biophysical experiment, where a microfluidic device is used to impose compressive stresses on cells by driving them through narrow microchannels under a controlled pressure gradient. The model is validated by reproducing cell entry into the microchannels. A parametric sensitivity analysis highlights the dominant influence of specific parameters, whose accurate estimation is essential for faithfully capturing the experimental setup. We found that surface tension and confinement geometry emerge as key determinants of translocation efficiency. Although tailored to this specific setup for validation purposes, the framework is sufficiently general to be applied to a broad range of cell mechanics scenarios, providing a robust and flexible tool for investigating the interplay between cell mechanics and confinement. It also offers a solid foundation for future extensions integrating more complex biochemical processes such as active confined migration.

6
Quantum-Refined Latent Diffusion: A Hybrid Generative Framework for Imbalanced ECG Classification

Kritopoulos, G.; Neofotistos, G.; Barmparis, G. D.; Tsironis, G. P.

2026-04-13 cardiovascular medicine 10.64898/2026.04.09.26350502 medRxiv
Top 5%
0.3%
Show abstract

Class imbalance in clinical electrocardiogram (ECG) datasets limits the diagnostic sensitivity of automated arrhythmia classifiers, particularly for rare but clinically significant beat types. We propose a three-stage hybrid generative pipeline that combines a spectral-guided conditional Variational Autoencoder (cVAE), a class-conditional latent Denoising Diffusion Probabilistic Model (DDPM), and a Quantum Latent Refinement (QLR) module built on parameterized quantum circuits to augment minority arrhythmia classes in the MIT-BIH Arrhythmia Database. The QLR module applies a bounded residual correction guided by Maximum Mean Discrepancy minimization to align synthetic latent distributions with real class-specific latent banks. A lightweight 1D MobileNetV2 classifier evaluated over five independent random seeds and four augmentation ratios serves as the downstream benchmark. Our findings establish latent diffusion augmentation as an effective strategy for imbalanced ECG classification and motivate further investigation of quantum-classical hybrid methods in cardiac diagnostics.

7
A safer fluorescent in situ hybridization protocol for cryosections

Chihara, A.; Mizuno, R.; Kagawa, N.; Takayama, A.; Okumura, A.; Suzuki, M.; Shibata, Y.; Mochii, M.; Ohuchi, H.; Sato, K.; Suzuki, K.-i. T.

2026-04-16 molecular biology 10.1101/2025.05.25.655994 medRxiv
Top 6%
0.3%
Show abstract

Fluorescent in situ hybridization (FISH) enables highly sensitive, high-resolution detection of gene transcripts. Moreover, by employing multiple probes, this technique allows for multiplexed, simultaneous detection of distinct gene expression patterns spatiotemporally, making it a valuable spatial transcriptomics approach. Owing to these advantages, FISH techniques are rapidly being adopted across diverse areas of basic biology. However, conventional protocols often rely on volatile, toxic reagents such as formalin or methanol, posing potential health risks to researchers. Here, we present a safer protocol that replaces these chemicals with low-toxicity alternatives, without compromising the high detection sensitivity of FISH. We validated this protocol using both in situ hybridization chain reaction (HCR) and signal amplification by exchange reaction (SABER)-FISH in frozen sections of various model organisms, including mouse (Mus musculus), amphibians (Xenopus laevis and Pleurodeles waltl), and medaka (Oryzias latipes). Our results demonstrate successful multiplexed detection of morphogenetic and cell-type marker genes in these model animals using this safer protocol. The protocol has the additional advantage of requiring no proteolytic enzyme treatment, thus preserving tissue integrity. Furthermore, we show that this protocol is fully compatible with EGFP immunostaining, allowing for the simultaneous detection of mRNAs and reporter proteins in transgenic animals. This protocol retains the benefits of highly sensitive, multiplexed, and multimodal detection afforded by integrating in situ HCR and SABER-FISH with immunohistochemistry, while providing a safer option for researchers, thereby offering a valuable tool for basic biology.

8
Imaging Mass Cytometry (IMC) as a Tool to Characterize Circulating Tumor Cells (CTCs) in Preclinical Mouse Models

Pore, M.; Balamurugan, K.; Atkinson, A.; Breen, D.; Mallory, P.; Cardamone, A.; McKennett, L.; Newkirk, C.; Sharan, S.; Bocik, W.; Sterneck, E.

2026-04-16 cancer biology 10.64898/2025.12.18.695262 medRxiv
Top 6%
0.3%
Show abstract

Circulating tumor cells (CTCs), and especially CTC-clusters, are linked to poor prognosis and may reveal mechanisms of metastasis and treatment resistance. Therefore, developing unbiased methods for the functional characterization of CTCs in liquid biopsies is an urgent need. Here, we present an evaluation of multiplex imaging mass cytometry (IMC) to analyze CTCs in mice with human xenograft tumors. In a single-step process, IMC uses metal-labeled antibodies to simultaneously detect a large number of proteins/modifications within minimally manipulated small volumes of blood from the tail vein or heart. We used breast cancer cell lines and a patient-derived xenograft (PDX) to assess antibodies for cross-species interpretation. Along with manual verification, HALO-AI-based cell segmentation was used to identify CTCs and quantify markers. Despite some limitations regarding human-specificity, this technology can be used to investigate the effect of genetic and pharmacological interventions on the properties of single and cluster CTCs in tumor-bearing mice.

9
Multi-task deep learning integrating pretreatment MRI and whole slide images predicts induction chemotherapy response and survival in locally advanced nasopharyngeal carcinoma

Hou, J.; Yi, X.; Li, C.; Li, J.; Cao, H.; Lu, Q.; Yu, X.

2026-04-11 radiology and imaging 10.64898/2026.04.07.26350350 medRxiv
Top 7%
0.2%
Show abstract

Predicting response to induction chemotherapy (IC) and overall survival (OS) is critical for optimizing treatment in patients with locally advanced nasopharyngeal carcinoma (LANPC). This study aimed to develop and validate a multi-task deep learning model integrating pretreatment MRI and whole slide images (WSIs) to predict IC response and OS in LANPC. Pretreatment MRI and WSIs from 404 patients with LANPC were retrospectively collected to construct a multi-task model (MoEMIL) for the simultaneous prediction of early IC response and OS. MoEMIL employed multi-instance learning to process WSIs, PyRadiomics and a convolutional neural network (ResNet50) to extract MRI features, and fused multimodal features through a multi-gate mixture-of-experts architecture. Clustering-constrained attention multiple instance learning and gradient-weighted class activation mapping were applied for visualization and interpretation. MoEMIL effectively stratified patients into good and poor IC response groups, achieving areas under the curve of 0.917, 0.869, and 0.801 in the train, validation, and test sets, respectively, and outperformed the deep learning radiomics model, the pathomics model and TNM staging. The model also stratified patients into high- and low-risk OS groups (P < 0.05). MoEMIL shows promise as a decision-support tool for early IC response prediction and prognostication in LANPC. Author SummaryWe have developed a deep learning model that integrates two types of medical images, including magnetic resonance imaging (MRI) and digital pathological slices, to simultaneously predict response to induction chemotherapy and prognosis in patients with locally advanced nasopharyngeal carcinoma. Current treatment decisions primarily rely on traditional tumor staging (TNM), which often fails to comprehensively reflect the complexity of the disease. Our model, named MoEMIL, was trained and tested on data from 404 patients across two hospitals and consistently outperformed both single-model approaches and TNM staging methods. By identifying patients who exhibit poor response to induction chemotherapy or higher prognostic risk, our tool can assist clinicians in achieving personalized treatment, enabling intensified management for high-risk patients and avoiding unnecessary side effects for low-risk patients. Additionally, we visualize the models reasoning process through heat map generation, which highlights the image regions exerting the greatest influence on prediction outcomes. This work represents a step toward more precise treatment for nasopharyngeal carcinoma; however, larger-scale prospective studies are required before the model can be integrated into routine clinical practice.

10
Heterogeneous, Population-Level Drug-Tolerant Persisters Exhibit Ion-Channel Remodeling and Ferroptosis Susceptibility

Hayford, C. E.; Baleami, B.; Stauffer, P. E.; Paudel, B. B.; Al'Khafaji, A.; Brock, A.; Quaranta, V.; Tyson, D. R.; Harris, L. A.

2026-04-13 systems biology 10.1101/2022.02.03.479045 medRxiv
Top 7%
0.2%
Show abstract

Drug-tolerant persisters (DTPs) represent a major obstacle to durable responses in targeted cancer therapy. DTPs are commonly described as distinct single-cell states that survive drug treatment via reversible, non-genetic mechanisms and drive tumor recurrence. Recent work demonstrates that multiple DTPs can coexist, reflecting diversity in lineage, signaling programs, or stress responses. However, each DTP is still generally viewed as a uniform cellular phenotype. Building on our prior work describing a population-level DTP termed "idling" [Paudel et al., Biophys. J. (2018) 114, 1499-1511], here we present evidence supporting a fundamentally different view: that DTPs are not single-cell states, but rather heterogeneous populations composed of multiple sub-states with distinct division and death rates that balance to produce near-zero net population growth. Using single-cell transcriptomics and lineage barcoding, we identify multiple phenotypic states within idling DTP populations, with reduced heterogeneity compared to untreated populations, and find that idling DTP cells emerge from nearly all lineages. Transcriptomic and functional analyses further reveal altered ion-channel activity in idling DTPs, which we confirm experimentally. Moreover, drug-response assays reveal increased susceptibility of idling DTPs to ferroptosis, a non-apoptotic form of regulated cell death, indicating the emergence of vulnerabilities associated with drug tolerance. Altogether, our results support a population-level view of tumor drug tolerance in which DTPs comprise stable collections of phenotypic states, shaped by treatment-defined phenotypic landscapes, which are potentially vulnerable to subsequent interventions. This perspective implies that eradicating DTPs will require a fundamental shift away from cell-type-centric strategies toward sequential treatments that progressively reduce phenotypic heterogeneity by modulating the molecular and cellular processes that establish the DTP landscape, an approach previously termed "targeted landscaping."

11
A multimodal AI model for modeling the genetic risk factor of Alzeihmer's disease

Nguyen, T. M.; Woods, C.; Liu, J.; Wang, C.; Lin, A.-L.; Cheng, J.

2026-04-15 health informatics 10.64898/2026.04.13.26350803 medRxiv
Top 7%
0.2%
Show abstract

The apolipoprotein E {varepsilon}4 (APOE4) allele is the strongest genetic risk factor for late-onset Alzheimer's disease (AD), the most common form of dementia. APOE4 carriers exhibit cerebrovascular and metabolic dysfunction, structural brain alterations, and gut microbiome changes decades before the onset of clinical symptoms. A better understanding of the early manifestation of these physiological changes is critical for the development of timely AD interventions and risk reduction protocols. Multimodal datasets encompassing a wide range of APOE4- and AD-associated biomarkers provide a valuable opportunity to gain insight into the APOE4 phenotype; however, these datasets often present analytical challenges due to small sample sizes and high heterogeneity. Here, we propose a two-stage multimodal AI model (APOEFormer) that integrates blood metabolites, brain vascular and structural MRI, microbiome profiles, and other clinical and demographic data to predict APOE4 allele status. In the first stage, modality-specific encoders are used to generate initial representations of input data modalities, which are aligned in a shared latent space via self-supervised contrastive learning during pretraining. This objective encourages the learning of informative and consistent representations across modalities by leveraging cross-modality relationships. In the second stage, the pretrained representations are used as inputs to a multimodal transformer that integrates information across modalities to predict a key AD risk genetic variant (APOE4). Across 10 independent experimental runs with different train-validation-test splits, APOEFormer predicts whether an individual carries an APOE4 allele with an average accuracy of 75%, demonstrating robust performance under limited sample sizes. Post hoc perturbation analysis of the predictive model revealed valuable insights into the driving components of the APOE4 phenotype, including key blood biomarkers and brain regions strongly associated with APOE4.

12
GRASP: Gene-relation adaptive soft prompt for scalable and generalizable gene network inference with large language models

Feng, Y.; Deng, K.; Guan, Y.

2026-04-14 bioinformatics 10.1101/2025.10.20.683485 medRxiv
Top 8%
0.2%
Show abstract

Gene networks (GNs) encode diverse molecular relationships and are central to interpreting cellular function and disease. The heterogeneity of interaction types has led to computational methods specialized for particular network contexts. Large language models (LLMs) offer a unified, language-based formulation of GN inference by leveraging biological knowledge from large-scale text corpora, yet their effectiveness remains sensitive to prompt design. Here, we introduce Gene-Relation Adaptive Soft Prompt (GRASP), a parameter-efficient and trainable framework that conditions inference on each gene pair through only three virtual tokens. Using factorized gene-specific and relation-aware components, GRASP learns to map each pair's biological context into compact soft prompts that combine pair-specific signals with shared interaction patterns. Across diverse GN inference tasks, GRASP consistently outperforms alternative prompting strategies. It also shows a stronger ability to recover unannotated interactions from synthetic negative sets, suggesting its capacity to identify biologically meaningful relationships beyond existing databases. Together, these results establish GRASP as a scalable and generalizable prompting framework for LLM-based GN inference.

13
From Chaos to Care: Personalized AI for Early Cardiac Arrhythmia Warning

Halder, S.; Kim, C. M.; Periwal, V.

2026-04-10 cardiovascular medicine 10.64898/2026.04.08.26350403 medRxiv
Top 9%
0.2%
Show abstract

Cardiac arrhythmias are abnormal heart rhythms characterized by disordered electrical dynamics that impair cardiac function and pose a major global burden of morbidity and mortality. Early and accurate prediction of arrhythmic anomalies from physiological time series is crucial for effective intervention, yet remains challenging due to the nonlinear, nonstationary, and individualized nature of cardiac dynamics. Despite significant advances in machine learning-based arrhythmia detection, most existing methods operate as static classifiers on electrocardiographic signals and lack online prediction, patient-specific adaptation, and mechanistic interpretability. From a dynamical-systems perspective, arrhythmias represent qualitative regime transitions, often preceded by subtle, temporally extended deviations that are difficult to detect in real time. Here we introduce CASCADE (Chaotic Attractor Sensitivity for Cardiac Anomaly Detection), an online and personalized anomaly forecasting framework built on a special type of reservoir computing called Dynamical Systems Machine Learning (DynML). DynML employs ensembles of continuous-time nonlinear dynamical systems as chaotic reservoirs to reconstruct and forecast short-term cardiac dynamics on a beat-to-beat basis, training only a linear readout. This design enables efficient online adaptation without retraining the underlying dynamical model. Rather than relying on static beat-level classification, CASCADE identifies arrhythmic events as failures of short-term predictability, manifested as statistically significant deviations between predicted and observed dynamics relative to subject-specific baselines. Detection performance is governed by the intrinsic dynamical complexity of the reservoir, quantified by topological entropy. Reservoirs operating near critical entropy regimes optimally amplify subtle, temporally extended irregularities in heartbeat dynamics, rendering incipient arrhythmic signatures linearly separable at the readout level. Topological entropy thus serves both as a predictor of model performance and a principled control parameter for reservoir design. When evaluated on the MIT-BIH Arrhythmia dataset, CASCADE achieved consistently high F1 scores, precision, recall, and overall accuracy across diverse patient populations, demonstrating strong generalizability across clinical and real-world settings. By integrating chaotic reservoir computing, entropy-guided tuning, and online personalized forecasting, CASCADE reframes arrhythmia detection as a problem of dynamical regime transition rather than static classification. This perspective provides a scalable, interpretable, and computationally efficient framework for real-time cardiac monitoring and early-warning clinical decision support.

14
Artificial Intelligence-Driven Identification of Age- and Treatment-Specific TP53 and PI3K Alterations in Pancreatic Ductal Adenocarcinoma

Diaz, F. C.; Waldrup, B.; Carranza, F. G.; Manjarrez, S.; Velazquez-Villarreal, E.

2026-04-11 gastroenterology 10.64898/2026.04.07.26350355 medRxiv
Top 9%
0.2%
Show abstract

BackgroundDespite extensive characterization of key oncogenic drivers, pancreatic ductal adenocarcinoma (PDAC) continues to exhibit profound molecular heterogeneity and inconsistent responses to standard therapies, including gemcitabine. The role of pathway-level alterations, particularly in the context of age at onset and therapeutic exposure, remains insufficiently defined. MethodsIn this study, we leveraged a conversational artificial intelligence framework (AI-HOPE-TP53 and AI-HOPE-PI3K) to enable precision oncology, driven interrogation of clinical and genomic data from 184 PDAC tumors, stratified by age at diagnosis and gemcitabine exposure. Using AI-enabled cohort construction and pathway-centric analyses, we evaluated alterations in TP53 and PI3K signaling networks, with findings validated through conventional statistical methods. ResultsTP53 pathway analysis revealed a significantly higher frequency of TP53 mutations in early-onset compared to late-onset PDAC among gemcitabine-treated patients (86.7% vs. 57.1%, p = 0.04), with a similar trend observed between treated and untreated early-onset cases (86.7% vs. 40%, p = 0.07). Notably, in late-onset PDAC patients not treated with gemcitabine, absence of TP53 pathway alterations was associated with improved overall survival (p = 0.011). Complementary analyses of the PI3K pathway demonstrated a higher prevalence of pathway alterations in late-onset gemcitabine-treated tumors compared to untreated counterparts (13.2% vs. 2.7%, p = 0.02). Importantly, among late-onset patients not receiving gemcitabine, those without PI3K pathway alterations exhibited significantly improved overall survival (p < 0.0001). ConclusionTogether, these findings identify distinct TP53 and PI3K pathway dependencies that are modulated by both age-of-onset and treatment exposure in PDAC. This work highlights the utility of conversational artificial intelligence in enabling rapid, integrative, and hypothesis-generating analyses within a precision oncology framework, supporting the identification of clinically relevant molecular stratification strategies for this aggressive disease.

15
Patient-Centred Communication in Lung Cancer Screening: A Clinically Focussed Evaluation of a Fine-Tuned Open-Source Model Against a Larger Frontier System

Khanna, S.; Chaudhary, R.; Narula, N.; Lee, R.

2026-04-11 oncology 10.64898/2026.04.10.26350595 medRxiv
Top 10%
0.1%
Show abstract

Lung cancer screening saves lives, yet uptake remains suboptimal and inequitable. Personalised communication can improve attendance and reduce anxiety, but scaling such support is a workforce challenge. We fine-tuned Googles Gemma 2 9B using QLoRA on 5,086 synthetic screening conversations and compared it against Googles Gemini 2.5 Flash (a larger frontier model) and an unmodified baseline across 300 multi-turn conversations with 100 patient personas spanning ten clinical categories. Evaluation combined automated natural language processing metrics with independent language model judgement in two complementary modes: structured clinical rubric and simulated patient persona. The fine-tuned model achieved the highest simulated patient experience score (3.71/5 vs 3.65 for the frontier model), recorded zero boundary violations after clinician review of all flagged instances, and led on the four most safety-critical categories. A composite Patient Adaptation Index showed that the fine-tuned model led overall (0.37 vs 0.35 vs 0.35), with its clearest advantage on the two clinically specific components: empathy calibration to patient distress and selective smoking cessation signposting. These findings suggest that targeted fine-tuning of open-source models can yield clinical communication quality comparable to larger proprietary systems, with advantages in safety-critical scenarios and suitability for NHS data governance constraints. Human clinician review of these conversations is ongoing.

16
A Replicable NeuroMark Template for Whole-Brain SPECT Reveals Data-Driven Perfusion Networks and Their Alterations in Schizophrenia

Harikumar, A.; Baker, B.; Amen, D.; Keator, D.; Calhoun, V. D.

2026-04-12 psychiatry and clinical psychology 10.64898/2026.04.08.26349985 medRxiv
Top 10%
0.1%
Show abstract

Single photon emission computed tomography (SPECT) is a highly specialized imaging modality that enables measurement of regional cerebral perfusion and, in particular, resting cerebral blood flow (rCBF). Recent technological advances have improved SPECT quantification and reliability, making it increasingly useful for studying rCBF abnormalities and perfusion network alterations in psychiatric and neurological disorders. To characterize large scale functional organization in SPECT data, data driven decomposition methods such as independent component analysis (ICA) have been used to extract covarying perfusion patterns that map onto interpretable brain networks. Blind ICA provides a data driven approach to estimate these networks without strong prior assumptions. More recently, a hybrid approach that leverages spatial priors to guide a spatially constrained ICA (sc ICA) have been used to fully automate the ICA analysis while also providing participant-specific network estimates. While this has been reliably demonstrated in fMRI with the NeuroMark template, there is currently no comparable SPECT template. A SPECT template would enable automatic estimation of functional SPECT networks with participant-specific expressions that correspond across participants and studies. The current study introduces a new replicable NeuroMark SPECT template for estimating canonical perfusion covariance patterns (networks). We first identify replicable SPECT networks using blind ICA applied to two large sample SPECT datasets. We then demonstrate the use of the resulting template by applying sc-ICA to an independent schizophrenia dataset. In sum, this work presents and shares the first NeuroMark SPECT template and demonstrating its utility in an independent cohort, providing a scalable and robust framework for network-based analyses.

17
Analytical Choices Impact the Estimation of Rhythmic and Arrhythmic Components of Brain Activity

da Silva Castanheira, J.; Landry, M.; Fleming, S. M.

2026-04-11 neuroscience 10.1101/2025.09.24.678322 medRxiv
Top 11%
0.1%
Show abstract

Brain activity comprises both rhythmic (periodic) and arrhythmic (aperiodic) components. These signal elements vary across healthy aging, and disease, and may make distinct contributions to conscious perception. Despite pioneering techniques to parameterize rhythmic and arrhythmic neural components based on power spectra, the methodology for quantifying rhythmic activity remains in its infancy. Previous work has relied on parametric estimates of rhythmic power extracted from specparam, or estimates of rhythmic power obtained after detrending neural spectra. Variation in analytical choices for isolating brain rhythms from background arrhythmic activity makes interpreting findings across studies difficult. Whether these current approaches can accurately recover the independent contribution of these neural signal elements remains to be established. Here, using simulation and parameter recovery approaches, we show that power estimates obtained from detrended spectra conflate these two neurophysiological components, yielding spurious correlations between spectral model parameters. In contrast, modelled rhythmic power obtained from specparam, which detrends the power spectra and parametrizes brain rhythms, independently recovers the rhythmic and arrhythmic components in simulated neural time series, minimising spurious relationships. We validate these methods using resting-state recordings from a large cohort. Based on our findings, we recommend modelled rhythmic power estimates from specparam for the robust independent quantification of rhythmic and arrhythmic signal components for cognitive neuroscience.

18
Ventilator triggering control with an LSTM-Based Model

Liu, J.; Fan, J.; Deng, Z.; Tang, X.; Zhang, H.; Sharma, A.; Li, Q.; Liang, C.; Wang, A. Y.; Liu, L.; Luo, K.; Liu, H.; Qiu, H.

2026-04-11 respiratory medicine 10.64898/2026.04.10.26350573 medRxiv
Top 11%
0.1%
Show abstract

Background: Patient-ventilator synchrony, an essential prerequisite for non-invasive mechanical ventilation, requires an accurate matching of every phase of the respiration between patient and the ventilator. Methods: We developed a long short-term memory (LSTM)-based model that can predict the inspiratory and expiratory time of the patient. This model consisted of two hidden layers, each with eight LSTM units, and was trained using a dataset of approximately 27000 of 500-ms-long flow signals that captured both inspiratory and expiratory events. Results: The LSTM model achieved 97% accuracy and F1 score in the test data, and the average trigger error was less than 2.20%. In the first trial, 10 volunteers were enrolled. In "Compliance" mode, 78.6% of the triggering by the LSTM model was compatible with neuronal respiration, which was higher than Auto-Trak model (74.2%). Auto-Trak model performed marginally better in the modes of pressure support = 5 and 10 cmH2O. Considering the success in the first clinical trial, we further tested the models by including five patients with acute respiratory distress syndrome (ARDS). The LSTM model exhibited 60.6% of the triggering in the 33%-box, which is better than 49.0% of Auto-Trak model. And the PVI index of the LSTM model was significantly less than Auto-Trak model (36.5% vs 52.9%). Conclusions: Overall, the LSTM model performed comparable to, or even better than, Auto-Trak model in both latency and PVI index. While other mathematical models have been developed, our model was effectively embedded in the chip to control the triggering of ventilator. Trial registration: Approval Number: 2023ZDSYLL348-P01; Approval Date: 28/09/2023. Clinical Trial Registration Number: ChiCTR2500097446; Registration Date: 19/02/2025.

19
Individualised evoked response detection based on the spectral noise colour

Undurraga Lucero, J. A.; Chesnaye, M.; Simpson, D.; Laugesen, S.

2026-04-13 health informatics 10.64898/2026.04.11.26350685 medRxiv
Top 12%
0.1%
Show abstract

Objective detection of evoked potentials (EPs) is central to digital diagnostics in hearing assessment and clinical neurophysiology, yet current approaches remain time-intensive and sensitive to inter-individual noise variability. Many existing detection methods rely on population-based assumptions or computationally demanding procedures, limiting robustness and efficiency in real-world clinical settings. We present Fmpi, a digital EP detection framework enabling individualised, real-time response detection through analytical modelling of the spectral colour and temporal dynamics of background noise within each recording. Using extensive simulations and large-scale human electroencephalography datasets spanning brainstem, steady-state, and cortical EPs recorded in adults and infants, we demonstrate performance comparable or superior to state-of-the-art bootstrapped methods while operating at a fraction of the computational cost and maintaining well-controlled sensitivity with improved specificity. Importantly, Fmpi incorporates a futility detection mechanism enabling early termination of uninformative recordings, reducing testing time without compromising diagnostic reliability.

20
Ad-verse Effects: Pharmaceutical Advertising Shifts Drug Recommendations by Consumer-Facing AI

Omar, M.; Agbareia, R.; McGreevy, J.; Zebrowski, A.; Ramaswamy, A.; Gorin, M.; Anato, E. M.; Glicksberg, B. S.; Sakhuja, A.; Charney, A.; Klang, E.; Nadkarni, G.

2026-04-16 health policy 10.64898/2026.04.14.26350868 medRxiv
Top 12%
0.1%
Show abstract

Large language models are increasingly used for clinical guidance while their parent companies introduce advertising. We tested whether pharmaceutical ads embedded in the prompts of 12 models from OpenAI, Anthropic, and Google shift drug recommendations across 258,660 API calls and four experiments probing distinct epistemic conditions. When two drugs were both guideline appropriate, advertising shifted selection of the advertised drug by +12.7 percentage points (P < 0.001), with some model scenario pairs shifting from 0% to 100%. Google models were the most susceptible (+29.8 pp), followed by OpenAI (+10.9 pp), while Anthropic models showed minimal change (+2.0 pp). When the advertised product lacked evidence or was clinically suboptimal, models resisted. This reveals a structured vulnerability: advertising does not override medical knowledge but fills the space where clinical evidence is underdetermined. An open response sub analysis (2,340 calls across three representative models) confirmed that advertising restructures free-text clinical reasoning: models echoed ad claims at 2.7 times the baseline rate while maintaining high stated confidence and rarely disclosing the ad. Susceptibility was provider dependent (Google: +29.8 pp; OpenAI: +10.9 pp; Anthropic: +2.0 pp). Because this bias operates within clinically correct answers, it is invisible to accuracy based evaluation, identifying a class of AI safety vulnerability that standard testing cannot detect.